Algebraic Multigrid on a Dragonfly Network: First Experiences on a Cray XC30

نویسندگان

  • Hormozd Gahvari
  • William Gropp
  • Kirk E. Jordan
  • Martin Schulz
  • Ulrike Meier Yang
چکیده

The Cray XC30 represents the first appearance of the dragonfly interconnect topology in a product from a major HPC vendor. The question of how well applications perform on such a machine naturally arises. We consider the performance of an algebraic multigrid solver on an XC30 and develop a performance model for its solve cycle. We use this model to both analyze its performance and guide data redistribution at runtime aimed at improving it by trading messages for increased computation. The performance modeling results demonstrate the ability of the dragonfly interconnect to avoid network contention, but speedups when using the redistribution scheme were enough to raise questions about the ability of the dragonfly topology to handle very communication-intensive applications.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparative Performance Analysis of Coarse Solvers for Algebraic Multigrid on Multicore and Manycore Architectures

We study the performance of a two-level algebraic-multigrid algorithm, with a focus on the impact of the coarse-grid solver on performance. We consider two algorithms for solving the coarse-space systems: the preconditioned conjugate gradient method and a new robust HSSembedded low-rank sparse-factorization algorithm. Our test data comes from the SPE Comparative Solution Project for oil-reservo...

متن کامل

A CUDA Implementation of the High Performance Conjugate Gradient Benchmark

The High Performance Conjugate Gradient (HPCG) benchmark has been recently proposed as a complement to the High Performance Linpack (HPL) benchmark currently used to rank supercomputers in the Top500 list. This new benchmark solves a large sparse linear system using a multigrid preconditioned conjugate gradient (PCG) algorithm. The PCG algorithm contains the computational and communication patt...

متن کامل

Performance Measurements of the NERSC Cray Cascade System

Cray began delivery of their next generation XC30 supercomputer systems in late 2012. One of the first systems, “Edison,” was delivered to NERSC and in this paper we present preliminary performance results obtained on this machine. The primary new feature of the XC30 architecture is the Cray “Aries” interconnect that includes a 48-port high radix router with a dragonfly topology. To demonstrate...

متن کامل

Optimising Hydrodynamics applications for the Cray XC30 with the application tool suite

Power constraints are forcing HPC systems to continue to increase hardware concurrency. Efficiently scaling applications on future machines will be essential for improved science and it is recognised that the “flat” MPI model will start to reach its scalability limits. The optimal approach is unknown, necessitating the use of mini-applications to rapidly evaluate new approaches. Reducing MPI ta...

متن کامل

A Roofline Performance Analysis of an Algebraic Multigrid PDE Solver

We present a performance analysis of a novel element-based algebraic multigrid (AMGe) method combined with a robust coarse-grid solution technique based on HSS lowrank sparse factorization. Our test datasets come from the SPE Comparative Solution Project for oil reservoir simulations. The current performance study focuses on one multicore node and on bound analysis using the roofline technique....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014